Bangla Part of Speech Tagging Using Contextual Embeddings and Oversampling Techniques

Koushik Roy, Md Hasan, K M Faizullah Fuhad, Nabeel Mohammed, A K M Shahariar Azad Rabby, Nazmul Hasan, Jebun Nahar, Fuad Rahman
Accepted to be presented at FTC 2020 - Future Technologies Conference 2020, 5-6 November 2020, Vancouver, Canada

Description

Part of Speech (PoS) Tagging has been a customary research area in the field of Natural Language Processing. The popularization of Neural Networks has opened substantially more scope of research for Bangla PoS Tagging especially with the class of sequential models particularly using Recurrent Neural Networks like Long Short Term Memory (LSTM) and Gated Recurrent Units (GRU). Our contribution in this paper is that we transformed the overall sequential modeling problem to an inconsequent model using BERT embeddings to leverage the existing well understood oversampling algorithms for improving PoS Tagging using a shallow feed-forward Neural Network. Our experiments result indicate that Synthetic Minority Over-sampling Technique (SMOTE ) works well as an oversampling algorithm for BERT embeddings

Publication

Bangla Part of Speech Tagging Using Contextual Embeddings and Oversampling Techniques

Description